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ABSTRACT 

This  paper  presents  techniques  to  aid  in  the  design  of  an 
observer  test.  To  select  an  appropriate  number  of 
observation  opportunities,  the  test  designer  can  use  the 
Fisher  Exact  Test  to  calculate  the  number  of  observation 
opportunities  required  so  that  a  given  experimental 
difference  in  probability  of  detection  will  be  statistically 
significant.  Alternatively,  the  designer  can  select  the 
number  of  observation  opportunities  to  guard  against 
rejecting  a  real  difference  in  probability  of  detection.  These 
criteria  require  calculating  the  probabilities  of  so-called 
Type  I  and  Type  II  errors  in  hypothesis  testing. 


Introduction 

In  a  previous  paper  (1),  I  discussed  the  advantages  of  the  Fisher  Exact  Test 
over  the  technique  of  fitting  logistic  curves  for  the  analysis  of  data  from  observer 
tests.  The  Fisher  Exact  Test  offers  the  advantage  of  yielding  quantitative  measures 
of  the  significance  of  observed  differences  in  detectability  instead  of  just  curves 
fitted  to  data.  In  this  paper,  I  will  discuss  the  use  of  the  Fisher  Exact  Test  in  the 
design  and  planning  of  observer  tests. 

The  Experimental  Situation 

Figure  1  illustrates  a  typical  test  setup  for  a  test  of  detectability.  Observers 
are  stationed  at  a  fixed  site  and  attempt  to  detect  a  vehicle  in  their  field  of  view. 

For  each  observation  opportunity,  the  test  personnel  record  the  number  of 
detections.  In  analyzing  the  data,  the  analyst  groups  the  observations  into  range 
bins  and  compares  the  proportion  of  detections  for  each  test  vehicle. 
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For  such  a  field  test,  the  test  designer  must  select  the  optimum  number  of 
observation  opportunities.  The  designer  must  balance  collecting  enough  data  to 
draw  valid  conclusions  against  the  high  cost  of  supporting  vehicles  and  personnel 
at  a  test  site. 

Analysis  Techniques 

Consider  the  contingency  table  in  Figures  2  to  5  as  an  example  of  the  use 
of  the  Fisher  Exact  Test  for  the  analysis  of  the  significance  of  differing 
proportions.  The  Fisher  Exact  Test  uses  the  hypergeometric  distribution  to 
calculate  the  probability  of  this  or  a  more  extreme  contingency  table  under  the  null 
hypothesis.  The  null  hypothesis  states  that  Vehicle  A  and  Vehicle  B  have  the  same 
probability  of  detection.  For  the  table  in  the  figures,  the  test  calculates  a 
probability  of  9.2  %,  too  high  to  reject  the  null  hypothesis  with  95  %  confidence. 

Note  that  in  this  example,  the  vehicles  differed  by  0. 1 7  in  Pd  but  the 
difference  still  was  not  statistically  significant.  To  insure  that  a  given  experimental 
difference  in  Pd  in  fact  will  be  significant,  the  test  designer  must  select  a  suitable 
large  number  of  observation  opportunities. 

Criterion  Based  on  Significance  of  Experimental  Difference 

Using  the  Fisher  Exact  Test,  the  test  designer  can  select  the  number  of 
observations,  N,  so  that  a  given  experimental  difference  in  detectability  (APd)  will 
be  statistically  significant  at  a  given  confidence  level.  For  example,  he  might 
require  that  an  experimental  0.15  difference  be  significant  with  95%  confidence. 
Created  using  the  Fisher  Exact  Test  and  its  Chi  Squared  approximation.  Figure  6 
shows  the  number  of  observation  opportunities  required  so  that  a  given  APd  is 
significant  at  the  95%  confidence  level.  Note  that  for  a  0.15  difference,  85 
observation  opportunities  are  required  for  each  vehicle. 

A  Second  Criterion  for  the  Number  of  Observation  Opportunities 

For  a  second  criterion  for  choosing  N,  consider  Figure  7,  a  table  of  the 
possible  outcomes  of  testing  the  null  hypothesis  that  the  two  vehicles  have  the 
same  Pd.  A  Type  I  error  occurs  if  we  conclude  that  Vehicle  A  has  a  lower  Pd  than 
Vehicle  B  when  in  fact  they  have  the  same  Pd.  The  probability  of  a  Type  I  error  is 
defined  as  the  significance,  p,  of  the  test.  For  95%  confidence,  p  must  be  less  than 
5%. 


A  vehicle  designer,  however,  would  be  interested  in  a  low  probability  of  a 
Type  II  error.  A  Type  II  error  occurs  when  we  accept  the  null  hypothesis  even 
though  Vehicle  A  really  has  a  lower  Pd  than  Vehicle  B.  In  the  case  of  a  Type  II 
error,  the  test  has  missed  an  effective  treatment.  For  example,  the  designer  may 
require  that  if  the  detectability  difference  is  0. 1 5,  then  the  chance  that  the 


difference  will  not  be  found  significant  in  the  experiment  (where  the  experimental 
detectability  difference  may  be  more  or  less  than  0. 1 5)  will  be  less  than  5%. 

Figure  8  plots  the  probability  of  a  difference  of  13  experimental  detections, 
enough  of  a  difference  to  declare  the  difference  significant  with  95%  confidence, 
when  each  vehicle  is  observed  85  times.  Figure  9  plots  this  same  data  as  the 
probability  of  committing  a  Type  II  error.  As  a  function  of  the  underlying 
probability  difference  between  the  two  vehicles,  it  becomes  more  and  more  likely 
that  a  significant  difference  will  be  observed  as  the  underlying  difference  increases. 
However,  if  the  underlying  difference  is  0.15,  then  the  chance  of  committing  a 
Type  II  error  is  50%  for  an  N  of  85.  Only  if  the  underlying  difference  is  0.28,  can 
the  designer  be  95%  sure  that  the  test  will  not  erroneously  miss  a  real  difference  in 
probability  of  detection. 

Applying  similar  analysis  to  other  numbers  of  observation  opportunities 
yields  Figure  9.  Note  that  requiring  the  chance  of  a  Type  II  error  to  be  less  than 
5%  reduces  the  sensitivity  of  the  test  by  half  for  a  given  number  of  opportunities. 

Conclusions 

The  Fisher  Exact  Test  is  useful  in  planning  observer  tests.  The  criteria  used 
to  design  the  test  have  a  strong  influence  on  the  size  of  difference  in  Pa  that  the  test 
can  be  expected  to  find  significant.  The  minimum  expected  detectable  difference  in 
Pa  based  on  control  of  Type  II  error  is  double  the  minimum  detectable  difference 
based  on  significance  of  the  observed  difference. 
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Figure  1.  Experimental  setup  for  an  observer  test. 
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Figure  2.  An  example  of  a  contingency  table. 
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Use  Hypergeometric  Distribution: 
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Figure  3.  The  hypergeometric  distribution  gives  the  probability  of  a  given  table. 
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Figure  4.  The  Fisher  exact  test  calculates  the  probability  of  the  given  table  or  a  more  extreme  table. 
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We  reject  the  null  hypothesis. 


We  cannot  be  sure  with  95%  confidence 
that  Vehicle  A  is  less  detectable  than 
Vehicle  B. 


Figure  5.  The  Fisher  Test  accepts  the  null  hypothesis. 

Number  of  Observation  Opportunities  For 
Significance  with  95%  Confidence 

Number  of 
Opportunities 


Figure  6.  The  number  of  observation  opportunities  required  for  a  given  difference  in  detection  probability 

to  be  significant  with  95%  confidence. 


Possible  Outcomes  of  Hypothesis  Testing 


Is  Vehicle  A  Less  Detectable 
Than  Vehicle  B? 


Decision: 

No 

Yes 

Accept  Null  Hypothesis 

Correct 

Type  II  Error 

Accept  Alternative  Hypothesis 

Type  1  Error 

Correct 

Figure  7.  Table  defining  Type  I  and  II  errors  in  hypothesis  testing. 
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of  13  or  More  Detections 
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Figure  8.  The  probability  that  test  wilt  yield  a  statistically  significant  difference  in  detection  probability. 
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Figure  9.  The  probability  of  committing  a  Type  II  error  with  85  observation  opportunities  for  each  vehicle. 
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Figure  10.  Number  of  observation  opportunities  required  to  meet  test  criteria. 
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Summary 
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Test  with  Fixed  Observers 


Fixed  Observers 


Example  of  the  Fisher  Exact  Test 


•  Is  the  experimental  contingency  table 
unlikely  under  the  null  hypothesis? 
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Use  Hypergeometric  Distribution: 
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We  reject  the  null  hypothesis. 


We  cannot  be  sure  with  95%  confidence 
that  Vehicle  A  is  less  detectable  than 
Vehicle  B. 


Requirement  for  Significance 


•  How  many  observation  opportunities  are 
required  for  a  given  experimental  difference 
in  Pd  to  be  significant? 

•  For  example,  if  we  want  an  experimental 
difference  of  0.15  to  be  significant,  how 
many  observation  opportunities  do  we 
require? 


Number  of  Observation  Opportunities  For 
Significance  with  95%  Confidence 

Number  of 
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A  Second  Criterion 


•  Vehicle  designer  wants  to  avoid  missing 
differences  in  probability  of  detection. 

•  If  there  are  85  observation  opportunities, 
what  underlying  difference  in  Pd  is  required 
for  a  5%  or  less  chance  of  missing  an 
effective  treatment? 


Possible  Outcomes  of  Hypothesis  Testing 
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Probability(Type  I  Error)  =  p  (the  significance) 


An  Example 


•  85  Observation  Opportunities  for  Vehicle  A 

•  85  Observation  Opportunities  for  Vehicle  B 

•  A  difference  is  significant  if  Vehicle  A  has 
13  fewer  detections  than  Vehicle  B 

•  How  much  must  the  underlying  Pd’s  differ 
for  us  to  be  95%  sure  the  experiment  will 
yield  a  difference  of  13  or  more? 
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Probability  of  Type  II  Error 
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Requirement  for  Probability  of 
Type  II  Error  Less  Than  5% 


For  85  observations  of  each 
vehicle,  the  Pd’s  must  differ  by 
0.28  or  more  for  the  probability 
of  a  Type  II  to  be  less  than  5%. 


Number  of  Opportunities  to  Meet 
Test  Criteria 


Pd  Difference 


Conclusions 


•For  observer  trials,  the  Fisher  Exact  test  is 
useful  for  determining  sample  size. 

•The  minimum  detectable  difference  in  Pd 
based  on  control  of  Type  II  error  is  double 
the  minimum  detectable  difference  based 
on  significance  of  the  observed  difference. 
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